Use a list of environment variables for JVM options.#444
Conversation
| ENV PYTHONPATH ${SPARK_HOME}/python/:${SPARK_HOME}/python/lib/py4j-0.10.4-src.zip:${PYTHONPATH} | ||
|
|
||
| CMD SPARK_CLASSPATH="${SPARK_HOME}/jars/*" && \ | ||
| env | grep SPARK_JAVA_OPT_ | sed 's/[^=]*=\(.*\)/\1/g' > /tmp/java_opts.txt && \ |
There was a problem hiding this comment.
This is pretty strange and is fairly dependent on the shell being Bash instead of sh. It would be good to make this compatible with /bin/sh but I couldn't think of a great way to handle this sanely without using arrays.
Separately, the usage of a temporary file is also sub-par. I found that when piping the result of env... sed into the readarray command via the <<< operator, readarray treated the entire input string as a single element of the array, making it such that the resulting array only had one large JVM option. I also tried piping the results of the sed command into readarray but that also isn't working for me in my testing.
I would appreciate any feedback or suggestions from those who are experienced at shell scripting.
There was a problem hiding this comment.
I'd also appreciate thoughts on how to avoid bouncing this off a temp file. What's there now is ugly but works, so if someone knows how to do this where it's pretty and works, that'd be even better
There was a problem hiding this comment.
cc @ifilonenko @erikerlandson @foxish for shell scripting suggestions
There was a problem hiding this comment.
readarray can be at the end of the pipe, which avoids the temp file. That's at least slightly cleaner.
There was a problem hiding this comment.
Piping to readarray didn't work for me; I believe the result was an empty array in the environment variable. But perhaps I was writing it incorrectly? I had this:
env | grep SPARK_JAVA_OPT_ | sed 's/[^=]*=\(.*\)/\1/g' | readarray -t SPARK_DRIVER_JAVA_OPTS
There was a problem hiding this comment.
Weird, I'd expect that to be equivalent to the code using the temp file.
ash211
left a comment
There was a problem hiding this comment.
Probably worth calling out in the PR message that this is a bugfix where previously space-separated values of spark.{driver,executor}.extraJavaOptions were ignored.
This change looks good to me -- I'd like to merge by EOD Monday since this is an important bugfix that we found in testing.
| .set("spark.logConf", "true") | ||
| .set( | ||
| org.apache.spark.internal.config.DRIVER_JAVA_OPTIONS, | ||
| "-XX:+|-HeapDumpOnOutOfMemoryError") |
There was a problem hiding this comment.
what was this +|- for before? afaik that's not valid syntax
There was a problem hiding this comment.
Not sure - probably just a badly written unit test.
| Seq.empty[String]) | ||
| } | ||
|
|
||
| test("Setting JVM options on the driver and executors with spaces.") { |
There was a problem hiding this comment.
right now this PR only tests drivers -- the executors is coming in the followup PR
There was a problem hiding this comment.
I'm ok with keeping the name as-is for now though given that the next PR is imminent.
| ENV PYTHONPATH ${SPARK_HOME}/python/:${SPARK_HOME}/python/lib/py4j-0.10.4-src.zip:${PYTHONPATH} | ||
|
|
||
| CMD SPARK_CLASSPATH="${SPARK_HOME}/jars/*" && \ | ||
| env | grep SPARK_JAVA_OPT_ | sed 's/[^=]*=\(.*\)/\1/g' > /tmp/java_opts.txt && \ |
There was a problem hiding this comment.
I'd also appreciate thoughts on how to avoid bouncing this off a temp file. What's there now is ugly but works, so if someone knows how to do this where it's pretty and works, that'd be even better
…es' into support-java-options-with-spaces
…-k8s#444) * Use a list of environment variables for JVM options. * Fix merge conflicts.
Closes #440.
As opposed to using a single environment variables string, which does not delimit in the shell properly. Requires upgrading the Docker image to match the submission client.